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Abstract. The cutoff phenomenon describes a sharp transition in the 
convergence of a family of ergodic finite Markov chains to equilibrium. 
Many natural families of chains are believed to exhibit cutoff, and yet 
establishing this fact is often extremely challenging. An important such 
family of chains is the random walk on G(n, d), a random d-regular graph 
on n vertices. It is well known that almost every such graph for d > 3 is 
an expander, and even essentially Ramanujan, implying a mixing-time 
of O(logn). According to a conjecture of Peres, the simple random walk 
on Q(n, d) for such d should then exhibit cutoff whp. As a special case 
of this, Durrett conjectured that the mixing time of the lazy random 
walk on a random 3-regular graph is whp (6 + o(l)) log 2 n. 

In this work we confirm the above conjectures, and establish cutoff in 
total-variation, its location and its optimal window, both for simple and 
for non-backtracking random walks on Q(n,d). Namely, for any fixed 
d > 3, the simple random walk on G(n, d) whp has cutoff at \og d _ 1 n 
with window order y/\ogn. Surprisingly, the non-backtracking random 
walk on G(n, d) whp has cutoff already at log d _ 1 n with constant window 
order. We further extend these results to Q(n,d) for any d = n *- 1 ' that 
grows with n (beyond which the mixing time is O(l)), where we establish 
concentration of the mixing time on one of two consecutive integers. 



A finite ergodic Markov chain is said to exhibit cutoff if its distance from 
the stationary measure drops abruptly, over a negligible time period known 
as the cutoff window, from near its maximum to near 0. That is, one has to 
run the Markov chain until the cutoff point in order for it to even slightly 
mix, and yet running it any further would be essentially redundant. 

Let (Xf) be an aperiodic irreducible Markov chain on a finite state space O 
with transition kernel P(x, y) and stationary distribution ir. The worst-case 
total-variation distance to stationarity at time t is defined by 
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where F x denotes the probability given Xq = x, and where \\fjt — ^||tVj the 
total-variation distance of two distributions fj,, v on fi, is given by 

HA* - fllTV = sup \fi(A) - v(A)\ = \ V \n(x) - u(x)\ . 

We define t M i X (s), the total-variation mixing-time of (Xt) for < e < 1, as 

= min{t : d(t) < e} . 

Next, let (x[ ) be a family of such chains, each with its corresponding 
worst-case total-variation distance from stationarity d n (t), its mixing-times 
^mix) e t c - We say that this family of chains exhibits cutoff at time %ix(i) 
iff the following sharp transition in its convergence to stationarity occurs: 

lim 4«x(e)Amx(l - e) = 1 for any < e < 1 . (1.1) 

n^oo 

The rate of convergence in ( |1.1| is addressed by the following: A sequence 
w n = o(t^x(j)) is called a cutoff window for the family of chains (x[ n ^) if 
for any e > there exists some c e > such that for all n, 

4«x(e) - 41(1 - £ )< c £ w n . (1.2) 

That is, there is cutoff at time t n = i^xCi) w ith window w n if and only if 

t { ul{s) = (1 + OK)) t„ = (1 + o(l))t n for any fixed < s < 1 , 

or equivalently, cutoff at time t n with window w n occurs if and only if 

limA^oo lim inf n ^oo d n (t n - Xw n ) = 1 , 
lim A ^oo limsup^oo d n {t n + \w n ) = . 

Although many natural families of chains are believed to exhibit cutoff, 
determining that cutoff occurs proves to be an extremely challenging task 
even for fairly simple chains, as it often requires the full understanding of 
the delicate behavior of these chains around the mixing threshold. Before 
reviewing some of the related work in this area, as well as the conjectures 
that our work addresses, we state a few of our main results. 

The focus of this paper is on random walks on a random regular graph, 
namely on G ~ Q(n,d), a graph uniformly distributed over the set of all 
(i-regular graphs on n vertices, for d > 3 and n large. This important 
class of random graphs has been extensively studied, among other reasons 
due to the remarkable expansion properties of its typical instance. One 
useful implication of these expansion properties is the rapid mixing of the 
corresponding simple random walk (SRW), the chain whose states are the 
vertices of the graph, and moves at each step to a uniformly chosen neighbor. 
Namely, the SRW on such a graph has a mixing time of O(logn) with high 
probability (whp), that is, with probability tending to 1 as n — > oo. 




Our first result establishes both cutoff and its optimal window for the SRW 
on a typical instance of Q(n,d) for any d > 3 fixed. As we later describe, 
this settles conjectures of Durrett [IT] and Peres [24] in the affirmative. 

Theorem 1. Let G ~ G(n,d) be a random regular graph for d > 3 fixed. 
Then whp, the simple random walk on G exhibits cutoff at ■j^\og d _ l n 
with a window of order y/\og n. Furthermore, for any fixed < s < 1, the 
worst case total-variation mixing time whp satisfies 

*mok(») = -r^ l °Sd-i n - (A + o(l))$" 1 (s) v / log d _ 1 n , 



where A = ^ 2 ) 3 / 2 an< ^ ^ ^ s ^ e c -d-f- °f the standard normal. 

The essence of the cutoff for the SRW on a typical G ~ G(n, d) lies in the 
behavior of its counterpart, the non-backtracking random walk (NBRW), 
that does not traverse the same edge twice in a row (formally defined soon) . 
Curiously, this chain also exhibits cutoff on G(n, d) whp, only this time the 
cutoff window is constant: ( |1.2[ ) holds for w n = 1 and c £ logarithmic in 1/e: 

Theorem 2. Let G ~ Q{n,d) be a random regular graph for d > 3 fixed. 
Then whp, the non-backtracking random walk on G has cutoff at log (J _ 1 (dn) 
with a constant-size window. More precisely, for any fixed e > 0, the worst 
case total-variation mixing time whp satisfies 



W(l-e) > r io gd-i( dra )l - Rogd-iCl/e)] , 
We) < flogd_ a (dn)l +3[log (i _ 1 (l/e)l +4 
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Figure 2. Distance from stationarity along time for the 
NBRW on a random 3-regular graph on n = 2000 vertices. 
Red curves represent a (41og d _ 1 (l/e))-wide cutoff window. 



To gain insight to the above behaviors of the SRW and NBRW on a typical 
instance of Q(n,d), note that whp, the random d-regular graph is locally- 
tree-like, its diameter is (1 + o(l)) log d _ 1 n and this is also the distance 
between a typical pair of vertices. In a (i-regular tree, the height of a SRW, 
started at the root, is analogous to a biased 1-dimensional random walk with 
speed (d—2)/d. Hence, the time it takes this walk to reach height log d _ 1 n is 
concentrated around logrf_i n with a standard deviation of order ylogn. 
Our results establish that at this time, the walk on G(n, d) is mixed. One of 
the keys to showing this is estimating the number of simple paths of length 



just beyond log d _ 1 n between most pairs of vertices (see Lemma 3.5 for a 
more precise statement). In comparison, as the NBRW started at the root 
of a tree is forbidden from backtracking up, it reaches height log d _ 1 n after 
precisely log d _ 1 n steps, hence the sharper cutoff window. 

Establishing the above theorems requires a careful analysis of the local 
geometry around typical pairs of vertices, via a Poissonization argument. 
Namely, we show that the number of edges between certain neighborhoods 
of two prescribed vertices is roughly Poisson. Similar arguments then allow 
us to formulate analogous results for the case of regular graphs of high 
degree, that is, Q(n,d) where d is allowed to tend to oo with n, up to n ^. 

1.1. Related work. The cutoff phenomenon was first identified for the case 
of random transpositions on the symmetric group in [14', and for the case 
of the riffle-shuffle and random walks on the hypercube in [2j. In their 
seminal paper j3 from 1985, Aldous and Diaconis established cutoff (and 
coined the term) for the top-in- at-random card shuffling process. See (l3l 
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and [12] for more on the cutoff phenomenon, as well as [27] for a survey of 
this phenomenon for random walks on finite groups. 

Unfortunately, there are relatively few examples where cutoff has been 
rigorously shown, whereas many important chains are conjectured to exhibit 
cutoff. Indeed, merely deciding whether a given family of finite Markov 
chains exhibits cutoff or not (without pinpointing the precise cutoff location) 
can already be a formidable task (see [i~3] for more on this problem). 



In 2004, Peres 24 proposed the condition i M ix(j) ■ gap — > oo as a cutoff 
criterion, where gap is the spectral gap of the chain (i.e., gap = 1 — A where 
A is the largest nontrivial eigenvalue of the transition kernel). While this 
"product-condition" is indeed necessary for cutoff in a family of reversible 
chains, there are known examples where this condition holds yet there is no 
cutoff (see [l~2"| Section 6]). Nevertheless, Peres conjectured that for many 
natural chains the product-condition does imply total- variation cutoff (e.g., 
this was recently verified in [15] for the class of birth-and-death chains). 

An important family of chains, mentioned in this context in [24], is SRWs 
on transitive "expander" graphs of fixed degree d (graphs where the second 
eigenvalue of the adjacency matrix is bounded away from d). Chen and 
Saloff-Coste [12] verified that such chains exhibit cutoff when measuring the 
convergence to equilibrium via other (less common) norms, and mentioned 
the remaining open problem of proving total-variation cutoff. 

On the other hand, it is well known that almost every ci-regular graph 
for d > 3 is an expander (see (9J, and also [25] for an analogous statement 
under a closely related combinatorial definition of expansion). In fact, it was 
shown by Friedman [18] that the second eigenvalue of the adjacency matrix 
of G ~ Gin, d) for d > 3 is whp 2y/d — 1 + o(l), essentially as far from d as 
possible. Thus, random regular graphs are a valuable tool for constructing 
sparse expander graphs, and furthermore, for any fixed d > 3, any statement 
that holds whp for Q(n, d) also holds for almost every d-regular expander. 
See, [Tl],[2l] and also [28] for more on the thoroughly studied model G(n, d). 

By the above, it follows that for any fixed d > 3, the mixing time of the 
SRW on G ~ G(n, d) is typically O(logn), whereas its gap is bounded away 
from 0. Hence, if we consider the SRW on graphs {G n ~ Gin, d)} for some 
fixed d > 3, then the product-condition typically holds, and according to 
the above conjecture of Peres, these chains should exhibit cutoff whp. 

A special case of this was conjectured by Durrett, following his work with 
Berestycki [8] studying the SRW on a random 3-regular graph G ~ G(n, 3). 
They showed that at time clog 2 n the distance of the walk from its starting 
point is asymptotically (| A 1) log 2 n. This implies a lower bound of 31og 2 n 
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(a) SRW on g(2 1000 , 3) (b) NBRW on G(W 9 , 3) 

Figure 3. Estimates on the total- variation distance from 
stationarity for SRWs and NBRWs on large 3-regular graphs. 

(a) Asymptotic behavior of i MIX established by Theorem [TJ 

(b) Lower and upper bounds according to Theorem [2j 

for the asymptotic mixing time of random 3-regular graphs, and in partic- 
ular, an asymptotic lower bound of 6 log 2 n for the lazy random walk (the 
lazy version of a chain with transition kernel P is the chain whose transi- 
tion kernel is ^(P + I), i.e., in each step it stays in place with probability 
^, and otherwise it follows the rule of the original chain). In [17 , Durrett 
conjectured that this latter bound is tight: 

Conjecture (Durrett |17| Conjecture 6.3.5]). The mixing time for the lazy 
random walk on the random 3-regular graph is asymptotically 61og 2 n. 

Theorem [T] stated above confirms these conjectures of Peres and Durrett 
(one can readily infer an upper bound on the mixing time of the lazy random 
walk from Theorem [TJ. Not only does this theorem establish cutoff and its 
location for the SRW on Q(n, d) (an analogous result immediately holds for 
the lazy walk), but it also determines the second order term in t M ix(s) for any 
< s < 1 (the term corresponding to the cutoff window of order yTogn) . 

The SRW on Q(n, d) for d = |_(logn) a J and a > 2 fixed, starting from v\ 
(not worst-case), was studied by Hildebrand |20 . He showed that in this 
case there is cutoff whp at (l+o(l)) log rf n, and asked whether this also holds 
for a < 2. As we soon show, the answer to this question is positive, even 
from worst-case starting point and after replacing the o(l) by an additive 2. 
To describe this result, we must first discuss the NBRW in further detail. 

1.2. Cutoff for the SRW and NBRW. While the SRW of a graph is a 
Markov chain on its vertices, the NBRW has the set of directed edges (i.e., 
each edge appears in both orientations) as its state space: it moves from 
an edge (x, y) to a uniformly chosen edge (y, z) with z ^ x. However, 
in most applications for NBRWs on regular graphs (see, e.g., [7] and the 
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references therein), one often considers the projection of this chain onto the 
currently visited vertex (i.e., (x, y) i— > y) , as it also converges to the uniform 
distribution on the vertices, and can thus be compared to the SRW. 

In [5] the authors compare the SRW and this projection of the NBRW on 
regular expander graphs, showing that the NBRW has a faster mixing rate 
(see [22] for the definition of this spectral parameter, which for the SRW 
coincides with the largest nontrivial eigenvalue in absolute value). However, 
it was not clear how this spectral data actually translates into a direct 
comparison of the corresponding mixing times. 

Theorems [T] and [2] as a bi- product, enable us to directly compare the 
mixing times of the SRW and NBRW (not only its projection onto the ver- 
tices). Namely, we obtain that the NBRW indeed mixes faster than the 
SRW on almost every (i-regular graph, by a factor of d/{d — 2). Surprisingly, 
the delicate result stated in Theorem [2] also shows that once we omit the 
"noise" created by the backtracking possibility of the SRW, we are able to 
pinpoint the cutoff location up to O(l) (see [19] for an example of such an 
0(1) cutoff window related to random walks on the symmetric group). 

Recalling that the cutoff window in Theorem [2] had the form log rf _ 1 (l/e), 
one may wonder what the effect of large degrees would be. Our results 
extend to the case of large d, all the way up to d = n°^ l \ beyond which the 
mixing time is constant (see, e.g., |16] ) hence there is no point in discussing 
cutoff. The cutoff window indeed vanishes as d — > 00, and the entire mixing 
transition occurs within merely two steps of the chain: 

Theorem 3. Let G ~ Q(n,d) where d = tends to 00 with n. Then 

whp, for any fixed < s < 1, the worst case total-variation mixing time of 
the non-backtracking random walk on G whp satisfies 

tuix(s) G (riog^dn)], [log d _ x (dn)l + l} . 

That is, the NBRW on G has cutoff whp within two steps of the chain. 

As a corollary, the relation between NBRWs and SRWs directly implies an 
analogous statement for the SRW on regular graphs of large degree. Here, 
the cutoff window becomes y^l/d) log d n (compared to ylogn for d fixed) , 
an< ^ ^ tog log n = o{d) then the walk completely coincides with the NBRW. 

Corollary 4. Let G ~ Q(n,d) where d = tends to 00 with n. Then 

whp, the SRW on G has cutoff at ^^log^^n with a window of \J 

Furthermore, if rfl °fg° g " — ► 00, then for any fixed < s < 1, the worst case 
total-variation mixing time of the SRW on G whp satisfies 

W(«) e {[log rf _ 1 (dn)l, flog d _ 1 (dn)] + 1} . 
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In particular, this answers the above question of Hildebrand (the case of 
d = L(logn) a J for any a > fixed) in the affirmative, even from a worst 
starting position. Furthermore, instead of a multiplicative 1 + o(l), the 
cutoff point is determined up to an additive 2 if a > 1. 

1.3. Random walks on the hypercube. As mentioned above, one of the 
original examples of cutoff was for the lazy random walk on the hypercube 
Q m , which was shown by Aldous j2j to exhibit cutoff at ^mlogm. When 
compared to the SRW on Q(2 m , m), guaranteed by Corollary |4|to have cutoff 
whp at (log 2+o(l))m/ log m (in this setting, d = log 2 n has °f g ° g n — »• oo), 
this demonstrates the slower than typical mixing of the hypercube. 

1.4. Organization. The rest of the paper is organized as follows. Section[2] 
contains several preliminary facts on random regular graphs. In Sections [3] 
and[4]we prove the main theorems, Theorems [T] and [2] resp., and in Section[5] 

we extend these proofs to the case of d large. 

2. Preliminaries 

Let G = (V, E), and let E denote the set of directed edges (i.e., E contains 
both orientations of every edge in E). Throughout the paper, we will use 
x, y, . . . for vertices in V, as opposed to x,y, . . . for directed edges in E. 



2.1. The configuration model. This model, introduced by Bollobas 1 10 
and sometimes also referred to as the pairing model, provides a convenient 
method of both constructing and analyzing a random regular graph. We 
next briefly review some of the properties of this model which we will need 
for our arguments (see [IT] , (2l] and |28| Section 2] for further information) . 

Given d and n with dn even, a d-regular (multi-)graph on n vertices is 
constructed via the configuration model as follows. Each vertex is identified 
with d distinct points, and a random perfect matching of all these dn points 
is then produced. The resulting multi-graph is obtained by collapsing every 
cZ-tuple into its corresponding vertex (possibly introducing loops or multiple 
edges). Let Simple denote the event that the outcome is a simple graph. 

It can easily be verified that, on the event Simple, the resulting graph is 
uniformly distributed over Q(n,d). Crucially, for any fixed d, 

P(Simple) = (1 + o(l)) exp ( ] —^-) , (2.1) 



4 

where the o(l)-term tends to as n — > oo. In particular, as this probability is 
uniformly bounded away from 0, any event that holds whp for multi-graphs 
constructed via the configuration model, also holds whp for Q(n,d). 

In fact, the statement in equation (2.1) was extended to any d = o(n 1//3 ) 



by McKay (23j . Although the asymptotical behavior of this probability was 
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thereafter determined for even larger values of d (see [28] for additional 
information), in this work we are only concerned with the case d = ra ^ 1 ), 
and hence this result will suffice for our purposes. 

A highly useful property of the configuration model is the following: we 
can expose the "pairings" sequentially, that is, given a vertex, we reveal the 
d neighbors of its corresponding points one by one, and so on. This allows 
us to "explore our way" into the graph, while constantly maintaining the 
uniform distribution over the pairings of the remaining unmatched points. 

2.2. Neighborhoods and tree excess. We need the following definitions 
with respect to a given graph G = (V, E). Let dist(-u, v) = dist g(u, v) denote 
the distance between two vertices it, v £ V in this graph. For any vertex 
u £ V and integer t, the t-radius neighborhood of u, denoted by Bt(u), and 
its (vertex) boundary dBt(u), are defined as 

B t (u) = {v e V : dist(n, v) < t} , dB t (u) = B t {u) \ B t _ x (u) . (2.2) 

The abbreviated form Bt will be used whenever the identity of u becomes 
clear from the context. The tree excess of Bt, denoted by tx(Bt), is the 
maximum number of edges that can be deleted from the induced subgraph 
on Bt while keeping it connected (i.e., the number of extra edges in that 
induced subgraph beyond \B t \ — 1). 

The next lemma demonstrates the well known locally-tree-like properties 
of a typical G ~ Q(n,d) for any fixed d > 3. Its proof follows from a stan- 
dard and straightforward application of the above mentioned "exploration 
process" for the configuration model. 

Lemma 2.1. Let G ~ G(n, d) for some fixed d > 3, and let t = log d _ 1 nj . 
Then whp, tx(B t {u)) < 1 for all u G V(G). 

Proof. Choose u € V uniformly at random, and consider the process where 
the neighborhood of u is sequentially exposed level by level, according to the 
configuration model. When pairing the vertices of level i (and establishing 
level i + 1) for some i > 0, we are matching 

mi<d V (d- l)\dBi\ 

points among a pool of (1 — o{\))dn yet unpaired points. For 1 < k < nii, 
let Tifr denote the cr-field generated by the process of sequentially exposing 
pairings up to the fc-th unmatched point in dB{. Further let An- denote 
the event that the newly exposed pair of the A;-th unmatched point in dBi 
already belongs to some vertex in Bi + \. Clearly, 

™ / , , ^ ^ (m,; — k) + (d — l)(k — 1) (d — l)m.j m; 

P (A,k Fi,k) < 1 A K m J y < 7T < — 2.3 

(1 — o(l))dn (1 — o{l)) an n 
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(where the last inequality holds for a sufficiently large n), and hence the 
number of events {Ai^ : 1 < k < rrij} that occur is stochastically dominated 
by a binomial random variable with parameters Bin(mj, rrii/ri). (We say 
that fj, stochastically dominates u , denoted by p >z v , if J fdp > J jdv for 
every bounded increasing function /.) Moreover, since m-j < d{d — l) 1 for 
any < i < t, it follows that Yut=o m i < did — 1)*, and the number of 
occurrences in the entire set of events {Ai^ : i < t} can be stochastically 
dominated as follows: 

g £ l Ai , k 1 Bin Ud - l)\ d{d ~ n irl ) ■ (2.4) 

i=0 k=l ^ ' 

Notice that, by definition, the number of such events that occur is exactly 
the tree excess of B t (u). We thus obtain that 



(tx(s f ) > 2, < o r: ir r K .J = o („-«• 




where the last equality is by the assumption on t. Taking a union bound 
over all vertices u G V completes the proof. ■ 

When proving cutoff for the NBRW in Section |4j we will be dealing with 
directed edges rather than vertices. The i-radius neighborhood of a directed 
edge x, denoted by Bt{x), and its boundary dBt(x), then consist of directed 



edges, and are defined analogously to (2.2) (with dist(x,y) measuring the 
shortest non-backtracking walk from x to y; note that dist(-,-) is not nec- 
essarily symmetric). The tree excess tx(Bt(x)) in this case will refer to the 
undirected underlying graph induced on Bt{x). 

2.3. The cover tree of a regular graph. Let G = (V,E) be a d-regular 
graph and u G V be some given vertex in G. The cover tree of G at u is 
a mapping ip : T — > V, where T is a (i-regular tree with root p, and the 
following holds: 

<P(P) = u > (2S) 
N G ((p(x)) = My) : y G N r (x)} for any x G T , V " ; 

where Nh(u) = {v £ V(H) : distj{(u, v) = 1} (i.e., dB\[y) for the graph H). 
That is, the root of T is mapped to u, and 92 respects 1-radius neighborhoods. 

The following two simple facts will be useful later on. First, there is a 
one-to-one correspondence between non-backtracking paths in G starting 
from u and non-backtracking paths in T starting from p. Second, if Xt is a 
simple random walk on T, then <p(Xt) is a simple random walk on G. 
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3. Cutoff for the simple random walk 

In this section, we prove Theorem[T] which establishes cutoff for the SRW 
on a typical random d-regular graph for any fixed d > 3. Throughout this 
section, let d > 3 be some fixed integer, and consider some G ~ G(n, d). 

We need the following definition concerning the locally tree-like geometry. 

Definition 3.1 (K-root). We say that a vertex u € V is a K-root if and 
only if the induced subgraph on Bk{u) is a tree, that is, tx(Bx(u)) = 0. 



Recalling Lemma 2.1 whp every vertex in our graph G ~ Q(n,d) has a 
tree excess of at most 1 in its l°g<i_i n \ -radius neighborhood. The next 
simple lemma shows that in such a graph (in fact, a weaker assumption 
suffices), a "burn-in" period of O(loglogre) steps allows the SRW from the 
worst-case starting position to reposition itself in a typically "nice" vertex. 

Lemma 3.2. Let K = |_l°gd-i log nj , and suppose that every u £ V has 
tx(£?5#-(«)) < 1. Then for any u £ V , the SRW of length 4K from (u,v) 
ends at a K-root with probability 1 — o(l). In particular, there are n — o{n) 
vertices in G that are K-roots. 

Proof. If tx(i?5^(n)) = then the induced subgraph on B$k is a tree and 
the result is immediate. 

If tx(i?5^(u)) = 1 then the induced subgraph on B§k is cycle C, with 
disjoint trees rooted on each of its vertices. Let Xt denote the position of 
the random walk at time t, and let pt = dist(-Xt, C), that is, the length of 
the shortest path between C and Xt in G. 

If the random walk is on the cycle then in the next step it either leaves 
C with probability ^zjr, or remains on C with probability |. Alternatively, 
if the random walk is not on C, then it moves one step closer to C with 
probability \ and one step further away with probability =^i. Either way, 

E[pt+i - p t | X t ] 



d 

Therefore, pt — — ^ is a martingale, and the Azuma-Hoeffding inequality 
(cf., e.g., (6l) ensures that 



4X(d-2) 

P4K -PO- 



ct 



K\ / -K 



> — < exp ? = o(l) . 



d 

We deduce that, whp, p^K > ^—^4 — y — ^ an< ^ ^ ence ^iK is a i^T-root. 

To obtain the statement on the number of iT-roots in G, suppose we start 
from a uniformly chosen vertex. Clearly, the random walk at time AK is 
also uniform, thus the probability that a uniformly chosen vertex is not a 
K-root is o(l), as required. ■ 
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The following lemma demonstrates the control over the local geometry 
around a .fT-root with K = 0(loglogn). 

Lemma 3.3. Set R = [f log^-i^J an d K = |_l°§d-i 1°§ n J • With high 
probability, every K-root u satisfies 

\dB t (u)\ > (1 - o(l))d(d - l)*" 1 for allt < R . 

Proof. Let u be a uniformly chosen vertex; expose its /f-neighborhood, and 
assume that it is indeed a iiT-root. Following the notation from the proof 
of Lemma 2.1 we let Aj^ be the event that, in the process of sequentially 
matching points, the newly exposed pair of the k-th unmatched point in 
dBi belongs to a vertex already in Bi+i. Further recall that, by ( |2.3| ) and 
the discussion thereafter, the number of events {A^ : < i < R] that 
occur is stochastically dominated by a binomial variable with parameters 
Bin (d(d — 1) R , d<yd ^ ) . Since the expectation of this random variable is 



d 2 {d-l) 2R - x /n<0{n 1 / 7 ) , 

the number of events Ai k with < i < R that occur is less than n 1 ' 6 (with 
room to spare) with probability at least 1 — exp(— ^(n 1 / 6 )). 

Each event A^ reduces the number of leaves in level i + 1 by at most 2 
and so reduces the number of leaves in level t > i by at most 2(d — l)* - * -1 
vertices. It follows that for each < t < R, 

\dB t \ > d(d - l)*- 1 - EE U itk 2(d ~ • (3-1) 

i<t k 

Set L = [| logrf_i n\ . As u is a K-root, no events of the form A^ with 
i < K occur, and the number of events A^ which occur with i < L is 
exactly tx(B^(u)), giving 

£ Y, ^.Ad ~ I)*" 4 " 1 < 2(d - l)'-^ 1 tx( J B i (n)) . 

i<L k 

Furthermore, by the above discussion on the number of events {A^} that 
occur, we deduce that with probability at least 1 — exp(— Q(n 1//6 )) 

t-l 

£ E - I)*" 1 " 1 < 2(d - l)*-^ 1 / 6 = o ((d - 1)*) . 

Plugging the above in (3.1) we get that with probability 1 — exp(— ^(n 1 / 6 )), 
\dB t \ > (1 - o(l))d(d - l)*" 1 - 2{d - l)*- K tx(£ L («)) , (3.2) 



and a union bound implies that (3.2) holds for all if-roots u and all t < R 
except with probability exp(— ^(n 1 / 6 )). 
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Finally, Lemma 2.1 asserts that whp every u satisfies tx(Bi(u)) < 1. 
Hence, whp, every K-root u satisfies \dB t \ > (1 - o(l))d(d - l)*" 1 for all 
< t < R, as required. ■ 

Let dBl(u) denote the set of vertices in dBt{u) with a single (simple) 
path of length t to u. We next wish to establish an estimate for the typical 
number of such vertices, intersected with some other neighborhood B t /(v). 

Lemma 3.4. Let K = |logd-il°g n J o,nd R = [f logd-i n J • With high 
probability, any two K-roots u and v with dist(u, v) > IK satisfy 

\dB* t {u) \ B t+ i{v)\ = (1 - o(l))d(d - l)'^ 1 for all t < R - 1 . 



Proof. The proof follows the same arguments as the proof of Lemma |3.3 
except now we begin with two randomly chosen vertices u, v. Expose Bx(u 
and Bjc(v), at which point we may assume that both u and v are i^T-roots 
and that dist(u, v) > 2K. Next, we sequentially expand the layers 



dBi = {w G V : dist(w, {u, v}) = i] for K < i < R . 
By the above assumption on u and v, we have 

\dB K \ = 2d(d-l) K ~ 1 . 
Repeating essentially the same calculations as those appearing in the proof 



of Lemma 



3.3 



now shows that with probability 1 — exp(— ^(ra 1 / 6 )), 
|&Bi| = (2 - o(l))d(d - I)*" 1 for allt < R , (3.3) 



thus whp, the above holds for all pairs of K-roots u, v with dist(u, v) > 2K. 

We claim that the statement of the lemma follows directly from (3.3 ). To 
see this, assume that (3.3) indeed holds for u, v as above, and let t < R — 1. 
Clearly, at most d(d — l)'^ 1 of the vertices in dBt belong to dBt(v), hence 

\dB t (u) \ B t (v)\ = (1 - o(l))d(d - l)*- 1 , 



and similarly, 
Therefore, 



\dB t+1 (v) \ B t+1 (u)\ = (1 - o(l))d(d - 1)* . 



\dB t (u)nB t (v)\=o(d(d-l) t - 1 ) , 
\dB t+1 (v)nB t+1 (u)\=o(d(d-l) t ) , 
and altogether we obtain that 

\dB t (u) n B t+1 {v)\ < \dB t (u) n Bt(y)\ + \B t (u) D dB t+1 {v) 
= o{d{d- 1)*) . 
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Since there are at most d(d — 1)* paths of length t from u to dBt(u), and 
since \dB t {u)\ = (1 - o(l))d(d - it then follows that 

\dB t {u)\dB* t {u)\=o{d{d-l) t - 1 ) . 

We deduce that \dB%{u) n £?i + i(V)| = o(d(d— 1)*), and the proof follows. ■ 

Lemma 3.5. Let K = |l°gd-i log n\ and T = [glog^^nj. With high 
probability, any two K-roots u and v with dist(it, v) > 2K satisfy 

S 2T+ e(u,v) > {l-o{l))-d{d-l) 2T+l - 1 

n 

for all 2K < I < ^jlog^.j^n, where Sk(u,v) denotes the number of simple 
paths of length k between u and v, and the o(l)-term tends to as n — > oo. 

Proof. Fix £ as above and expose the neighborhoods of u and v up to distance 

t u = \\{2T + £-!)] , t v = {±(2T + £-l)\ 
respectively. Notice that this selection gives 

2T + £ - 1 = t u + t v , < t u - t v < 1 . 

We further define 

A u = dB* u (u) \ B tv (v) , A v = dBliv) \ B tu (u) . 

We may now assume that the statement of Lemma |3.4| holds with respect 
to the neighborhoods of u and v already revealed (and them alone), that is 

\A U \ = (l-o(l))d(d-l) t "- 1 , 

\A V \ = (1 - o(l))d(d - l)^- 1 . 

In other words, A u has (1 — o{l))d{d— 1)*" unmatched points and similarly, 
A v has (1 — o(l))d(d — l) tv unmatched points. 

Now, sequentially match each of the points in A u , and let M u v denote 
the number of points of A u matched with points in A v . To obtain an upper 



bound on M u<v , we once again repeat the arguments of Lemma 2.1 implying 
ically bounded from above 

M UyV r< Bin ((d- 1)14,1, 



that it is stochastically bounded from above by a binomial variable as follows 

(d- 1)|4| 
(1 - o{l))dn 

Since 

{d-lf\A u \\A v \ ^ o(n i/i 0) 
dn 

Chernoff bounds (cf., e.g., [6]) give that M ue < n 1 / 4 except with probability 
e~ n( - nl/4 \ We thus assume that indeed M UjV < n 1 / 4 . 

In this case, as we sequentially match points, each point in A u has at least 
1 4 1 —n 1 ^ remaining points in 4 which it could potentially be matched to. 



CUTOFF FOR RANDOM WALKS ON RANDOM REGULAR GRAPHS 



15 



That is, conditional on previous matchings each point has at least ^~ jL ^~ / 
probability of being matched to a point in A v . It follows that M uv is stochas- 
tically bounded from below by a binomial variable 

H,^(|d-HW,!!:«-^ 



dn 
Now 

(d ~ im f^'~ nl/4) = (i - o{i)) l -d{d- 1) 2 ^- 1 = nOogS-m) , 

an n 

and again by Chernoff bounds we have that the number of matchings is at 
least (1 — o(l))^d(d — l) 27 ^ -1 except with probability 

exp(-0(log^_ 1 n)) = o(n -3 ) . 

Each matching between a point in A u and a point in A v determines a simple 
path from u to v of length 2T + £, thus 

1 



S 2T+ e{u,v) >M U , V > (l-o(l))-d(d-l 



,2T+f-l 



d-1 



n 



Taking a union bound over all it, f and £ completes the result. 

Proof of Theorem [TJ Set K = [log^-i logra] and set T = L^log, 
By Lemma 3.2 after 4K steps with high probability the random walk is at 
a K-root. Since we are only seeking to establish t MIX up to an accuracy of 
o( y / log d _ 1 n) and since K = o( ^\og^~[n) it is enough to consider the worst 
case mixing from a .fT-root to establish the result. 

Let us assume that the statement of Lemma 13.51 holds. Let u and v be 



-fC-roots with dist(it, v) > 2K. By Lemma 3.5 



S 2 t+i{u, v) > - — °^d(d - l) 2 ^- 1 for 2K < £ < ^ log, , n 
n zu 



Now let T be the cover tree for G at u with a map ip, as defined in (2.5). 
Since each simple path in G corresponds to a distinct simple path in T, 

#{w £ T : = u, dist(p, u>) = 2T + £} > iS 2 t+K m j v) 

> l^o(l) d(d _ 1)2 T+«-l 

n 

when 2.K" < £ < log^-i n. Let be a SRW on T started from p and let 
Wt = <p(Xt) be the corresponding SRW on Q started from u. Note that, by 
symmetry, conditioned on dist(p, Xt) = k the random walk is uniform on 
the d(d— points {w G T : dist(p, w) = k}. In addition, a random walk 
on a d-regular tree with d > 3 is transient, since the distance from the root 
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is a biased random walk with positive speed. In particular, the random walk 
returns to p only a finite number of times almost surely. If Xt 7^ p then 

(dist(X t+ l,p)-dist(X t ,p)) ~ I ~ X 

Therefore, the Central Limit Theorem gives that 



distpQ,p)-^ d 



N(0,1). 



(3.4) 



Let A be the set of vertices which are if-roots and whose distance from u 
is greater than 2K. Since there are at most d(d — l) 2 ^ -1 = o(n) vertices 
within distance 2K of u, and since by Lemma 3.2 there are n — o{n) iiT-roots 
in total, it follows that \A\ >n — o(n). 

Combining these arguments, we deduce that if v G A and 

d 



d-2 



log d _i n + ky/log^ 



n 



(3.5) 



then 



= ^P(dist( P ,X t ) : 

3=0 

jo lo gd-i n 
> P(dist(p,X t ) = 2T + 



#{w G T : = u, dist(p, w) = j} 



i=2K 



(l + o(l); 



d(d - ly- 1 

1 ^d(d-l) 2T + i - 1 

1 



2T + 2K < dist(p, X t ) < 2T +^ l °Sd-i n 



(l + o(l))- 



1 - $ 



A 



where the final equality follows from equation (3.4) and where $ is the 



distribution function of the standard normal and A = 2 ^ d 2 1 



d-2 



Then 



\\nw t g 



< 



7t||tv = max 
n-\A\ 



F(W t = v) , 



n 



max 



P(W t = v) , 



1 



< o(l) + (l + o(l))L4|-$ 



n 



-k 

T 



(i + (i))$ 



A 



(3.6) 



It remains to provide a matching lower bound for ||P(Wt G 
this end, let M = log rf _ 1 n — K and note that 



7T 1 1 TV • To 



ir(B M (u)) < -d(d-l) 
n 



M-l 



o(l) . 
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If w G T and dist(p, w) < M then ip(w) G -Bm- For the same choice of t as 



given in ( |3.5| ), equation (3.4) gives that 

P(dist(X t , p) < M) = (1 + o(l))$ 

and so 

nw t G Bm) > + 



A 



It follows that 



IIPIH', €-)-w\ x\ > Plir, € B,,)- "( B.u ) - i I +o(l))# (-^). C!.7) 



Combining equations (3.6) and (3.7) establishes that for any < s < 1 

WOO = log d _x n - (A + o(l))* -1 (s) vlog^i n , 
completing the proof. ■ 

4. Cutoff for the non-backtracking random walk 

In this section, we prove Theorem [2] that establishes the cutoff of the 
NBRW on a typical random (i-regular graph for d > 3 fixed. Throughout 
this section, let d > 3 be some fixed integer, and consider some G ~ Gin, d). 

Since the SRW induces a cutoff window of order \/logn merely on account 
of its backtracking ability, throughout our arguments in Section [3] we could 
easily afford burn-in periods of order log log n. On the other hand, our 
statements for the NBRW establish a constant cutoff window (and moreover, 
logarithmic in and therefore require a far more delicate approach. 

Recall that the NBRW is a Markov chain on the set of directed edges; we 



thus begin by defining a directed -KT-root, analogous to Definition 3.1 



Definition 4.1 (directed K-root). A directed edge x G E is a directed 
K-ioot iff the induced subgraph on Bk{x) is a tree, i.e., tx(Bi<-(S)) = 0. 

As before, it is straightforward to show that the directed edges of G have 
locally-tree-like neighborhoods. This is stated by the next lemma. 

Lemma 4.2. Let L = Lgl g(i-i n J- Then whp, tx(B^(x)) < 1 for all 
x G E. In addition, for any r = r(n) and h = h(n) — * oo arbitrarily slowly, 
whp at least dn — h(d — l) 2r directed edges satisfy tx(B r ) = 0. 

Proof. Clearly, if x = (u,v) G E we have tx(Bt(x)) < tx(B^(t;)) for any t, 
thus the first statement of the lemma follows immediately from Lemma |2.1| 
To show the second statement, recall the exploration process performed in 



the proof Lemma 2.1 where A{ & denoted the event that the fc-th matching 
generated in the i-th layer already belongs to our exposed neighborhood. 
In our setting, we perform a similar exploration process on a random x = 
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(u, v) G E, only this time the initial vertex v corresponds to d — 1 points 



rather than d (having excluded its edge to u). Thus, (2.4) translates into 
££l^Bin(( d -l)<+\<^) . 

i=0 k=l V 



It follows that the probability that tx(B r (x)) > is at most O (d — l) 2r /n), 
and the expected number of such x G E is O ((<i — l) 2r ), as required. ■ 



The following lemma, which is the analogue of Lemma |3.2| shows that a 
small burn-in period typically brings the NBRW to a directed L-root for a 
certain L (and allows us to restrict our attention to such starting positions). 

Lemma 4.3. Let e > 0, set K = [log d _ 1 (2/e)] and L = \ L ^\og d _ l n\. Let 
x G E be such that tx(Bpc+L(x)) < 1- Then the non-backtracking walk of 
length K from x ends at a directed L-root with probability at least 1 — e. 

Proof. Let H be the subgraph formed by the elements (directed edges) of 
Bk+l(x), and notice that the L-radius neighborhoods of all possible end- 
points y of a non-backtracking walk of length K from x are all contained in 
H. Thus, if tx(i?x+L(^)) = then clearly every such endpoint is a directed 
L-root. 

Otherwise, consider the undirected underlying graph of H. This graph 
contains a single simple cycle C (by the assumption that tx(i?x+i(^)) < 1), 
therefore the distance of any vertex u G H from C is well defined. Let (Wt) 
denote the non-backtracking random walk started at Wo = x. For some 
1 < t < K , write Wt = (u, v) and Wt+i = (v,w). Crucially, we claim 
that if dist(w , C) < dist(u;,C), then Wj is a directed L-root for all j G 
• ■ • j K}. Indeed, our subgraph consists of a cycle C with disjoint trees 
rooted at some of its vertices. Therefore, as soon as the non-backtracking 
walk makes a single step away from C, by definition it can only traverse 
further away from C with each additional step (as long as it is in H). 

Furthermore, if v ^ C (that is, v belongs to one of the trees rooted on 
C), then with probability -py the distance to C decreases by 1 in Wt+i, 
otherwise it increases by 1. Similarly, 

F(w G C | u,v G C) = l/{d-l) . 

The remaining case is the single step immediately following the first visit 
to the cycle C, if such exists, where the probability of remaining on C 
(traversing along one of the two possible directions on it) is j 2 ^. Altogether, 

Px(W K is not a directed L-root) < 2(d - l)~ K < e, 
as required. ■ 
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The next two lemmas are the analogues of Lemmas 3.3 and 3.4 for directed 
iT-roots, and both follow by essentially repeating the original arguments. 

Lemma 4.4. Set T = ^log d _ 1 n and K = K{n). Then with probability 
1 — o(n -3 ), every directed K-root x satisfies 

\dB t {x)\ > (l - (d - iy K - O^ 1 / 5 )) (d - 1)* for allt<T . 

Lemma 4.5. Let e > 0, T = Mjlog d _in and L = [g log^-i • With 
probability 1 — o(n~ 3 ), any two directed L-roots x and y with dist(x, y) > 2L 
satisfy 

\B t {x) n B t {y)\ < n - 1/7 (d - 1)' for allt<T . 

We now turn to prove the Poissonization argument, on which the entire 
proof of Theorem [2] hinges. Recall that in Theorem [T] we could afford a 
relatively large (order log log n) error, which enabled us to apply standard 
large deviation arguments for the size of cuts between certain neighborhoods 



of two vertices u, v (as studied in Lemma 3.5 ). On the other hand, here we 
can only afford an 0(1) error, so the number of paths of length the mixing 
time between two random vertices will approximately be a Poisson random 
variable with constant mean. In order to bypass this obstacle and derive the 
concentration results needed for proving cutoff, we instead consider the joint 
distribution of u and vertices v%, . . . ,vm for some large (poly-logarithmic) 
M. This approach, incorporated in the next proposition, amplifies the error 
probabilities as required. 

Proposition 4.6. Let e > 0, set 

K = r21og d _ 1 (l/e)l , T = riog d _ x (dn)l , fi=(d- l) T+K /dn , 
and for each x £ E, define the random variable Z = Z{x) by 

F(Z = k) = ^ n \{y£E:M T+K -i(x,y) = k}\ , 

where J\fi(x,y) is the number of l-long non-backtracking paths from x to y. 
Then whp, every x that is a directed L-root for L = |~g log^_ 1 ((in)] satisfies 

E [\{Z{x)/n) - 1| | T G ] < 2e + , 

L 1 J log log n 

where Tg is the a-field generated by the graph G ~ Q(n,d). 



Proof. Condition on the statement of Lemma |4.2| for the choices r(n) = L 
and h(n) = logn. That is, we assume that there are at least dn — (log n)n 1//3 
directed L-roots in E. 

Let x be a uniformly chosen directed edge, and expose its L-radius neigh- 
borhood according to the configuration model. As the statement of the 
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proposition only refers to directed L-roots, we may at this point assume that 
x is indeed such an edge (recall that the property of being a directed L-root 
is solely determined by the structure of the induced subgraph on Bl (x) , and 
thus this conditioning does not affect the distribution of the future pairings). 
With this assumption in mind, continue exposing the neighborhood of x to 
obtain B 2 l{x). 

Our goal is to show that 

P (E [|(Z(x)/ M ) - 1| | Tg\ > 2e + = o(l/n) , 

in which case a first moment argument will immediately complete the proof 
of the proposition. 

We next consider a uniformly chosen set of M directed edges, B C E, for 
some log 2 n < M < 2 log 2 n (to be specified later) , by selecting its elements 
one by one. That is, after i steps (0 < i < M), \B\ = i and we add a directed 
edge uniformly chosen over the dn — i remaining elements of E. With the 
addition of every new element, we also develop its 2L-radius neighborhood. 

Notice that, after i steps, there are at most (logn)n 1 / 3 directed edges 
which are not directed L-roots in E, and furthermore, 

\B 2L (x) U (Uy eB B 2L (y))\ <(i + l)n 1 ' 3 < Mn 1 ' 3 . 

Therefore, the probability that the (i + l)-th element of B either belongs 
to one of the existing 2L-radius neighborhoods, or is not a directed L-root, 
is at most 2Mn -2 / 3 . Clearly, the probability that 4 such "bad" edges are 
selected is at most 0(M 4 n~ 8/3 ) = o(n~ 2 ). 

Altogether, we may assume with probability 1 — o(n~ 2 ), the set B contains 
a subset B' = {yi, . . . , j/m'} of size M' > M— 3, such that the following holds: 

(i) Every member of {x} U B' is an L-root. 

(ii) The pairwise distances of {x} U B' all exceed 2L. 

For any y G E, let Zy = MT+K-i(x,y), and for any S C E, let Z$ be 
the random variable that accepts the value Zy with probability l/\S\ for 
each y £ S. We will use an averaging argument to show that Z can be well 
approximated by Zg , which in turn is well approximated by Z& . 

Setting 

Ti = L(T + K)/2\ , T 2 = f(T + K)/2\ - 2 , 

we wish to develop the Ti-radius neighborhood of x as well as the L2-radius 
neighborhoods of every y G B' . To this end, put 

U = dB Tl (x), V t ^dB T2 {y t ), 

U = U\ D t B T2 (yt) , Vi = Vi\ (B Tl (x) U (Uj#Br 2 (%•))) • 
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Recalling Lemma 4.4 (and the fact that {x} U B' are all directed L-roots), 
with probability 1 — o(n~ 3 ) we have 

M>(l-0(n-*)) (rf-l) Tl , 

\Vi\ > (l - 0(n-f>)) (d - 1) T2 for all i G [M'\ . 



Combining this with Lemma 4.5, we deduce that for any sufficiently large n 
the following holds with probability 1 — o(n -3 ): 

1 - 2n - ?) (d - l) Tl < < (d- l) Tl , 

1 - 2n~^) (d - 1) T2 < \Vi\ < (d - 1) T2 for all i G [M'] . 



We will use a standard Poissonization approach in order to approximate the 
joint distribution of the variables {Zy : y G B'} (that are fully determined 
by the graph G) using the following set of variables: 



Z 



{u,v G E :u£U, v G V-} (i G [M 1 ]) 



We claim that 2/^ < Zy { for all i. To see this, recall that Zy t counts the 
number of non-backtracking paths of length T + K — 1 from x to y^. Since U 
and Vi are disjoint subsets of the boundaries of the Ti-radius neighborhood 
U and the T2-radius neighborhood V% respectively, every edge between them 
corresponds to at least one distinct such path of length T1+T2+I = T+K—l. 
Therefore, by the triangle inequality, 

~ M' „ 



ME 



Zb 



1 



M' 

i=l 
AT 

= E 

i=l 



E 

56B 



1 < 



E 

i=i 



1 



Z, 



+ 



M ' Z 



+ E 

Zr, 



+ E f 

Zn 



+ 3 



i=i 



(4.1) 



Let Z denote the first summand in the last expression: 

M' 

i^^|(z ft //i)-i| . 



i=i 



The following lemma estimates Z, as well as the second summand in (4.1). 
Lemma 4.7. Define Z and Zy t for i = 1, . . . , M' as above. Then: 

Z>z+T^)=°(^ 2 ), (4-2) 
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and 

F (iyhi <1 _ e _ i—\ = ( n -2) . (4.3) 
\M ^ u - log log n j y i y I 

Proof. We claim that, with probability 1 — o(n~ 2 ), each of the variables 
Zy t is stochastically dominated from below and from above by i.i.d. pairs of 
binomial variables, R~ < Rf (coupled in the obvious manner), defined as: 



R7 ~ Bin ( (1 - n~h(d - lf 2+1 ,p-) , p~ = (1 - n~h^-^—- 
\ ) an 

ii+~Bin((d-l) T2+1 ,p + ) , p + ^{l +n -\) {d ~] )Tl+1 



At = Rf -R7>0. 



To see this, consider the configuration model at the starting phase where 
the vertices in U U (UjVi) all have degree 1 (that is, each of these vertices 
comprise [d— 1) points that still wait to be paired), and expose the pairings 
of the points in tyj sequentially. Suppose that for all j < i we have already 
constructed a coupling where RJ < Zy i < Rj, and next wish to do the 
same for Zy { . 



By Lemma 4.5 with probability 1 — o(n ) there still remain at least 
(1 — n _1 / 8 )(d— 1) T2 vertices of degree 1 in Vi and at least (1 — n^ 1 / 8 )(d — l) Tl 
such vertices in U (otherwise the intersection of either B{yi) or B{x) with 
one of B(yx), . . . , B(f/j_i) would contain at least n~ 1 / 7 (d— l) Tl vertices). We 
thus have at least (1 — n _1//8 )(d — 1) T2+1 unmatched points corresponding 
to Vi, and at least (1 — n~ 1//8 )(ci — l) Tl+1 unmatched points corresponding 
to U. Associating each such point corresponding to Vi with a Bernoulli 
variable, which succeeds if and only if it is matched to U, clearly establishes 
the coupling of > R^ . 

Conversely, V < (d — 1) T2 and U < (d — l) Tl , hence there are at most 
(d — 1) T2+1 unmatched points corresponding to Vi and at most (d — l) Tl+1 
unmatched points corresponding to U. Since both the Ti-radius and the T^- 
radius neighborhoods of any element contains 0{y/n) distinct vertices, the 
probability of a point corresponding to Vi being matched to U is at most 

(d-lfi +1 (d-l) Tl+1 



dn-0(M^h~) ~ (1 -o{n~ l / 4 ))dn ' 

Therefore, we can readily construct the coupling Zy i < Rf. 

Since it was possible to construct each of the above couplings with prob- 
ability 1 — o(n~ 3 ), clearly all M' variables can be coupled as above with 
probability 1 — o(n~ 2 ). 
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Finally, consider a set of i.i.d. binomial random variables Qi with means 
EQi = n={d- l) T+K /dn, defined by 

(d-l) T ^ +u 



Qi ~ Bin Ud - 1 



and coupled in the obvious manner such that < Qi < Rf . Clearly, as 
\Zy i — Qi\ < — -R 4 r = Aj, it follows that 



z = — y 

M' ^ 

i=i 



1 



1 A/ lO \ M 

< J_y\Q 1 _ 1 + J_y 

i=i ^ i=1 



Since fi > (d - 1) A > 1/e 2 , for all i G [M'j we have 



E 



< -y / ^4Q i 

I 1 



1 + 0(n-4; 



A* 



< 1 + 



log n 



(4.4) 



< (1 — n s) (n 4+n s)+n sM+rj 4 j = O [n 4 



Qi 
A* 

where the last inequalities in both estimates hold for any sufficiently large 
n. Furthermore, since the {Qi}-s are i.i.d. binomial variables, Chernoff's 
inequality implies that 

/ 1 i!L D \ MM' „/, logn s2\ 

>(_L y bil > i + < e"4do g io g «)^ = e- Q l ( ™^ } J = o(n- 2 ) , 

VM' ^ a loglogn^ v ; > 

i=l ^ 

(4.5) 



log log n 



and an analogous argument for the {Aj}-s (recall that by definition, we 
have Aj = A^ + A", where the {A^}-s and {A"}-s are two sequences of i.i.d. 
binomial variables, independent of each other), combined with the fact that 
EAj//i = O (n" 1/4 ), gives 



1 M ' A 

,M y-A i 

VM' ' /i loglogn 



> 



^ < e Q ( ( logfogn) 2 ) = ( n ~2) _ ( 4 g) 



Define 



1^-1 






1 n 







Since E |(Qi//i) — 1| < (1 + < 2 for large n (with room to spare), we 

deduce that Xt is a martingale with bounded increments: 



\X t+l -X t \ <2 + E 



< 4 . 



Therefore, Azuma's inequality (cf., e.g., pi Chapter 7.2]) implies that 



x m>/M' > j-i-) < 



e a 



M'/(41og logn) 2 



o(n- 2 ) 



(4.7) 
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Since E |(Qi//x) - 1| < (1 + and 



M> 



M 

£ 

i=l 



1^-1 


= E 


9i_i 


1 /i 







M' 



i=l 



the bounds in (4.5) and (4.7) now imply that 

M' 



i=i r 



1 



> e + 



-) = o(n~ 2 ) . 



Together with (4.4) and (4.6), we obtain that (4.2) indeed holds. 

Similarly, since Zy { > R~ for all i, and the {il7~}-s are i.i.d. binomial 
variables with Ei?^ > (1 — e — Sn -1 / 8 )^, we can apply Chernoff's inequality 
to derive a lower bound on Y^iLii^yil I 1 )- Keeping in mind that 



we obtain that (4.3) holds, as 

M 



— y 2 ^- < i 
M i^i ~ 



£ — I < e V^°g!°g" ; J = {n ) 



i 



log log n 



This completes the proof of Lemma 4.7 



We can now combine (4.2) and (4.3) with (4.1), and deduce that the 
following statement holds with probability 1 — o(n -2 ): 



E 



Zb 



1 



< 2e - 1 + 



log log n 



+ 



1 



M ^ a 



(4.8) 



To transform the above into the required bound on Z, take M = [log 2 n\ , 
and consider a collection of bins, each of size either M or M + 1, such that 
the total of their sizes is dn. Let B[, . . . , B'^ denote the M-element bins, and 
let B'{, . . . ,B" 2 denote the (M + l)-element bins. Next, randomly partition 
the elements of E into these bins (i.e., each bin B will contain a uniformly 
chosen set of \B\ directed edges). 

Since there are at most \dn/M\ = 0{n/M) different bins, and for each 
bin the corresponding Zq satisfies ( |4.8[ ) with probability at least 1 — o(n~ 2 ), 
we deduce that all the variables Zgi and Zgn satisfy this inequality with 
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probability at least 1— o(l/n). Therefore, with probability at least 1— o(l/n), 



E 



Z 
L M 



M 
dn 



1 

dn 



E 



y&E 

z, 



< 2e - 1 + 



3 



5 



1 I 



+ 



M + l 



f2 



log log n dn 



+ J_ 



dn 
= 2e + 











J /" 





log log n 



where the last equality follows from the fact that 

£ Zy = E Mt+k-i(x, y) = (d- l) T+K 



udn 



y y 
This completes the proof. 



Proof of Theorem HJ Let (W t ) be the non-backtracking random walk, 
and let ir denote the stationary distribution on E. 

The lower bound is a consequence of the following simple claim: 

Claim 4.8. Every d-regular graph on n vertices satisfies 

W(l - e) > riogd_i(dn)] - [log^^l/e)] for anyO<e <1 . 

Proof of claim. Let s > and let xq £ E be any starting position. Clearly, 
at time T = \\og d _i(edn)\ we have 

|5J3 T (xo)| < (d- 1) T < edn , 
and the set A = E \ dBx(xo) has stationary measure at least 1 — e. Thus, 

Wx {W T G •) - TtHtv > |P So (W T G A) - 7T(A)| > 1 -£ , 

implying that t M ix(l — e) > T. The proof now follows from the fact that 

flog^xCdn)] - pog^l/e)] = pog^dn)! + Llog d _i£j 

< llog^edn)} <T + 1 . M 

For the upper bound, let xq be the worst starting position, and let 
x = Wt Q , where to = R°Sd-i(2/e)l • Let Lr denote the event that x is 
a directed L-root, where L = [| log d _ 1 (dn)] . Conditioning on the state- 
ments of Lemma 4.2 and Lemma 4.3 (and recalling that both hold whp) we 
obtain that P Sq (Lr) > 1 — e. 

Condition on the statement of Proposition |4.6| and following its notation, 
let Z(x) accept the value Mt+k-i{x,v) with probability 1/dn, where 

K = [21og d _ 1 (l/ e )l , T = \\og d _ x {dn)-\ , fi= (d— l) T+K /dn . 
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The following then holds: 



E 

y&E 



» s {W t+ k = y | Lr) 



dn 



^\{y : N T+ K-l(x,y) = k}\ 



(d - l) T + K dn 
E[|(Z//x)-l| \r G ] <2e + o(l) 



(4.9) 



where in the last inequality we applied Proposition |4.6| onto the directed 
L-root x (given the event Lr). We deduce that for t(e) = to + T + K: 

1 

dn 



TV 



< 



1. 



XQ 



(lr)E 



2 ^ 

y&E 



= y | Lr) 



1 



+ 



.' : o 



(LR C ) 



<e + (l- e)P i0 (LR c ) + o(l) <2e-e 2 + o(l) < 2e , 



(4.10) 



where the first inequality in the last line is by (4.9), the second one is due 



to the fact that P(Lr c ) < e, and the third inequality holds for sufficiently 
large values of n. Therefore, for any large n we have 

W(e) < tie/2) < flog^iCdn)! +3 [bg <t _ 1 (2/e)] + [log d _ 1 2l 
< riog d _ 1 ( ( in)l+3riog d _ 1 (l/e)l+4 

(where in the last inequality we used the fact that d > 3), as required. ■ 



5. Cutoff for random regular graphs of large degree 

In this section, we prove Theorem [3] and Corollary [4] which extend our 
cutoff result for the SRW and NBRW on almost every random regular graph 
of fixed degree d > 3 to the case of d large. To prove cutoff for the NBRW, we 
adapt our original arguments (from the case of d fixed) to the new delicate 
setting where our error probabilities are required to be exponentially small 
in d. The behavior of the SRW is then obtained as a corollary of this result. 

Throughout the section, let d = d{n) — > oo with n, and recall that we 
further assume that d = n°^ l \ since otherwise the the mixing time is O(l) 
and cutoff is impossible. 

5.1. NBRWs on random regular graphs of large degree. As we will 
soon show, when d is large we no longer need to deal with ET-roots (and the 
locally-tree-like geometry of the starting point of our walk), as all vertices 
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will have sufficient expansion whp. However, the analysis of the configu- 
ration model becomes more delicate, as the probability that it produces a 
simple graph is (1 + o(l)) exp ( 1 ~^ 2 ) (see (2.1)), which now decays with n. 



Thus, to prove that the probability of an event goes to on Q(n, d), we must 
now show that its probability is o (exp(— (f 2 /4)) in the configuration model. 

Lemma 5.1. With high probability, for all x G E and all t < | log d _ 1 n, 

\dB t (x)\>(l-o(l))(d-lf . (5.1) 



Proof. The proof is an adaption of Lemma |3.3| Pick a directed edge x 
uniformly at random and expose its first level. Since we are interested in 
probabilities conditioned on the graph G being simple, we may assume that 
\dB\{x)\ = d — 1, that is, there are no self-loops or multiple edges from x. 
We will show that (5.1) holds with probability 1 — o (n~ x exp(— d 2 /4)) 



for the above x in the configuration model. Clearly, for any t < t' we have 
\dBt{x)\ > (d— 1)* ^ t \dBf(x)\, hence we can restrict our attention to 8Bt(x) 
where T = Lflog^nJ. 



Following the notation in the proof of Lemma 2.1 let A^j. be the event 
that, in the process of sequentially matching points, the newly exposed pair 
of the k-th. unmatched point in dBi belongs to some vertex already in -Bj+i. 



Further recall that, by (2.3) and the discussion thereafter, the number of 
events {A^ : < % < T} that occur is stochastically dominated by a 

binomial variable with parameters Bin ^(cZ — 1) T+1 , By our choice 

of T, the expectation of this random variable is 

( d _l)2T+l /n < (in l/7< n l/7+o(l) ; 

hence the number of events Ai with < i < T that occur is less than n 1 / 6 
(with room to spare) with probability at least 1 — exp(— J7(n 1//6 )). Next, set 

L= Lglogd-inJ , p= [4 + 2d 2 /logn] = o{d 2 ). 

As before, we can stochastically dominate the number of events A^^ that 
occur in the first L levels, {Aik : < i < L}, by a binomial variable 

Xi ~ Bin M d — 1) L+1 , — ^ J ■ Since the expected value of Xl is 

(d-l) 2L+1 /n = o(n- 1 / 2 ) , 

and since L — > oo with n (by our assumption on d), it is easy to verify that 

^(X L > p) = (1 + o(l))F(X L = p) = o{n-p/ 2 ) . 

Recalling the definition of p, it now follows that the number of events Ai k 
with < i < L that occur is less than p except with probability o(n _2 e _rf2 ). 
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Each event An- reduces the number of leaves in level i + 1 by at most 2, 
hence it reduces the number of leaves in level t > i by at most 2(d — l)* - ' -1 
vertices. It then follows that for each < t < T, 

\dB t (x)\ >(d-lY-J2Y, ~ 1 ) t ' i ~ 1 ■ ( 5 - 2 ) 

i<t k 

As |5-Bi(x)| = d — 1, there are no events of the form Aq^. Therefore, by the 
discussion above, with probability 1 — o(n~ 2 e~ d2 ) we have 

£ £ l Ai , k 2(d - I)*"*" 1 < 2(d - l)*- 2 p = o ((d - 1)*) . 

i<L k 

Furthermore, by the above discussion on the number of events A^j. that 
occur, we deduce that with probability at least 1 — exp(— f^n 1 / 6 )) 
t-i 

£ ^ l Afc 2(d - l)^- 1 < 2(d - l)*"^ 1 /" = o ((d - 1)*) . 

i=L k 



Plugging the above in ( |5.2[ ), we obtain that with probability 1 — o(n e 

\dB t (x)\>(l-o(l))(d-iy, (5.3) 



and a union bound implies that (5.3) holds for all directed edges x which sat- 
isfy \dBi(x)\ = d — 1 except with probability exp(— d 2 )) = o(exp(— d 2 )). 
By (2.1), it now follows that (5.1) also holds whp over Q(n,d). M 



The following lemma, the analogue of Lemma |3.4| is proved by essentially 
following the same argument as in the proof of Lemma |3.4[ i.e., calculating 
the size of the common neighborhood of two vertices. The difference is 
again that here we need to deal with the fact that the probability that the 
configuration model is a simple graph is exponentially small in d. This is 
achieved by repeating the approach, demonstrated in Lemma 5.1 above, of 
treating B\{x) separately. Applying this analysis to the neighborhoods of 
the 2 starting directed edges x, y gives the required result, with the remaining 
arguments of Lemma 3.4 left unchanged (we omit the full details). 

Lemma 5.2. Set T = ^log d „ 1 n and L = glog ci _ 1 n. Then whp, for 
every x,y 6 E with dist(x, y) > 2L and every t < T, 

\B t (x)UB t (y)\ ^rT^id-l)* . 

The final ingredient needed is the analogue of the Poissonization result of 
Proposition 4.6 as given by the following proposition. 

Proposition 5.3. Let e > 0, set 

T = [log^iCdn) + 21og d _ 1 (l/e)l , /i = (d - l) T /dn , 
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and for each x S E, define the random variable Z = Z(x) by 

1 

dn 



k) 



\{y e E : N T -i(x,y) = k}\ 



where J\fj>(x,y) is the number of l-long non-backtracking paths from x to y. 
Then whp, every x satisfies 

E [\(Z(x)/fi) - 1| | T G ] <2e + — ^ , 

L 1 J log log n 

where Tq is the a-field generated by the graph G ~ Q(n,d). 



The proof of the above proposition is essentially the same as the proof 
of Proposition |4.6| with some minor adjustments to the estimates to ensure 
that they hold with probability o (exp(— <i 2 /4)) . The main necessary change 
is to let the bin sizes depend on d, namely to set M = d 3 log 2 n. As only 
minor adjustments to some of the bounds are required elsewhere, we omit 
the details. 

Proof of Theorem |3j The lower bound of t mx (s) > [log d _ 1 (dn)] follows 
immediately from Claim 4.8, whose proof remains valid without change, 
even when d is allowed to grow with n. 

To obtain the upper bound, let (Wt) denote the non-backtracking random 
walk started at Wq = x. Set e = 3s, and 



T= riog d _ 1 (dn)+21og d _ 1 (l/ £ )l 

By Proposition |5.3| we have that whp, 

1 

dn 



H=(d- l) T /dn . 



E 

y&E 



^x(W T = y) 



Y / \{y--^T-i(x, y ) = k}\ 



k 



1 

dn 



k 



\Z = k\T G ) 



= E [\(Z/fj.) - 1| | F G ] < 2e + o(l) < s 

for large n. We conclude that t mx (s) < T < \log d _i(dn)~\ + 1, since 
log d _ 1 (l/e) = o(l) by our assumption on d. ■ 

5.2. Duality between non- backtracking and simple random walks. 

The following observation is attributed to Yuval Peres: 

Observation 5.4. Conditioning on being in level k of the simple random 
walk on the tree, we are uniform over k-long non-backtracking random walks. 
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More specifically, let T be the cover tree for G at u with a map (p, as 



defined in (2.5). Let Xt be a SRW on T started from p and let Wt = f(Xt) 
be the corresponding SRW on G started from it. Compare this with a NBRW 
random walk Wt started from x = (w, u) where w is chosen uniformly from 
the neighbors of u. For a directed edge (y,z) let ip(-) denote the projection 
ip((y,z)) = z, giving the vertex the NBRW is presently situated at. 

Note that, by symmetry, conditioned on dist(p, Xt) = k the random walk 
is uniform on the d(d — points {w G T : dist(p, w) = k}. By the 

obvious one-to-one correspondence between paths of length k from p in T 
and non-backtracking paths of length k in G from u, the following holds: 
Conditioned on dist(p, X t ) = k we have that Wt is distributed as ip(Wk)- 
Thus, if Wt is mixed at time k then a SRW will be mixed once its lift to the 
cover tree reaches distance k from the root. 

Proof of Corollary [4} In our proof of Theorem [T] it was shown using the 
Central Limit Theorem (see equation ( |3.4[ )) that the distance from the root 
of the walk in the cover tree is given by 

dist(Xt,p)-&=P± d , s 

"' - iV(0,l). (5.4) 



2Vd z 



When d grows with n this Gaussian approximation still holds provided the 
variance satisfies — > oo or equivalently (t/d) — > oo. When d and t 
are of the same order, the number of backtracking steps is asymptotically 
a Poisson random variable with mean (t/d), therefore (t — dist(X(, p)) is 
distributed as twice a Po(t/d) random variable. In both of these cases, 
whenever t has order log d _ 1 n, the variance of dist(Xt, p) is of order i° g7 \ . 
Finally, when t/d — > 0, the number of backtracking steps goes to as well. 
This understanding of dist(Xt, p) will allow us to translate our results on 
NBRWs into statements on SRWs. 

If w £ T and dist(p, w) < R then <p(w) G Br and hence, 

||P(W t G •) - ttIItv > HWt € B R ) - tt(B r ) > P(dist(X t ,p) <R)- tt(Br). 

In particular, as \Br\ < 0(-£^) = o(l) for R < log d _ 1 (n) — 1, we have that 

||P(W t G •) - ttIItv > P (dist(X t ,p) < log^jCn) - l) - o(l). (5.5) 

Next, let £>fc = dTyi/Wk,^) be the total variation distance between the 



NBRW and the stationary distribution. According to Observation 5.4 (the 



correspondence between walks on the cover tree conditioned to be at distance 
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k and NBRWs of length k), the following holds: 

t 

||P(Wi G ■) — 7t||tv < X] H P ( Wt e ' I dist (^> P) = fc ) " ^IItv 

fc=0 

■P(dist(X tj p) = k) 

t 

= ^aP(dist(X t ,p) = fc) . 

fc=0 

Now, by Theorem [3j when > [log d _ 1 (dn)] we have Qk = o(l), hence 

||P(Wi G •) " ttIItv < P (dist(X t ,p) < flog^dn)]) +o(l). (5.6) 



Equations (5.5) and (|5.6|) imply that mixing takes place when dist(Xt, p) 



is log rf _ 1 n + 0(1). By the above discussion on the distribution of dist(X t , p) 
this occurs when t is around -A^ l°g<2-i n with window */ !? g - 



It remains to address the case where d '? g log n — > 00. Notice that here, 

log 71 ' 

as the probability of the SRW on G making a backtracking step is 1/d, the 
probability of backtracking anywhere in its first [log d _ 1 (dn)] +1 steps is o(l). 
Hence, we can couple the SRW and NBRW in their first [log d _ 1 (dn)] + 1 
steps whp, implying they have the same mixing time. In particular, we may 
conclude that for any fixed < s < 1, the worst case total- variation mixing 
time of the SRW on G whp satisfies 

W(«) e (riog^dn)], [log^j/dn)] +1} , 

as required. ■ 



6. Concluding remarks and open problems 

• We have established the cutoff phenomenon for SRWs and NBRWs on 
almost every d-regular graph on n vertices, where 3 < d < (beyond 
which the mixing time is 0(1) and we cannot have cutoff). For both 
walks, we obtained the precise cutoff location and window: 

1. For the SRW, the cutoff point is whp at l°S<2-i n > an d m fact, 
we obtained the two leading order terms of t Mlx (s) for any fixed s. 

2. For the NBRW, cutoff occurs at log d _ 1 (dn) whp (335 times faster 
than the SRW) with an 0(1) window. Moreover, for large d, the 
entire mixing transition takes place within a 2-step cutoff window. 

• Given our discussion in Section [l] on expander graphs (and the product- 
criterion for cutoff), it would be interesting to extend our results to any 
arbitrary family of expanders. While one may design such graphs where 
the SRW has no cutoff, such constructions seem highly asymmetric, and 
the following conjecture seems plausible (see also (TsJ Question 5.2]): 
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Conjecture 6.1. The SRW on any family of vertex-transitive expander 
graphs exhibits cutoff. 

• Similarly, recalling the above comparison of t MIX of the SRW and the 
NBRW on random regular graphs, it would be interesting to extend this 
result to any family of vertex-transitive expander graphs. 
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